perm filename INDIA.PRO[ESS,JMC]1 blob
sn#005462 filedate 1971-04-15 generic text, type T, neo UTF8
00100 A Proposal for Getting the Scientific Literature into
00200 Computer Form by Keyboarding it in India
00300
00400 Background Facts
00500
00600 1. Eventually, there will be a library of all the world's
00700 literature in a computer file readable on consoles from anywhere.
00800 This is discussed more fully in
00900
01000 2. For the first time there is a file capable of holding
01100 large numbers of books at a reasonable cost, namely the trillion bit
01200 Precision Instruments Laser File. It could store 250,000 books at a
01300 cost of $4.00 each.
01400
01500 3. One of these files will be connected to the ARPA network
01600 of research organizations in information processing techniques and
01700 will be available in March 1972 to the ARPA supported projects and
01800 some additional research organizations.
01900
02000 4. It has been proposed as an experiment to put a library of
02100 computer science on this file.
02200
02300 5. There exist at present about 100 display consoles
02400 suitable for reading literature on this file but more can be added at
02500 about $700.00 a piece.
02600
02700 6. The largest expense in creating this library is getting
02800 the information into computerized form. The cost of doing this is
02900 said to be 75 cents to $1.00 per thousand characters by keyboarding
03000 in the U.S. Some OCR techniques promise 25 cents per thousand with
03100 a reduction to 10 cents for a large enough job.
03200
03300 7. Wages in India are about 1/10 that of the U.S.
03400
03500 8. The U.S. has about a billion dollars in Indian rupees
03600 from PL 480 sales of grain that cannot be converted into dollars.
03700
03800 On the basis of the above facts we have the following
03900 proposal:
04000
04100 1. The U.S. spend about $1,000,000 to $2,000,000 in dollars
04200 and from $10,000,000 to $20,000,000 in blocked rupees to put books
04300 and reports, primarily technical, into computerized form.
04400
04500 2. The project be carried out in Bombay and be directed by
04600 personnel from the Tata Institute of Fundamental Research.
04700
04800 3. The Tata Institute do the R & D associated with the
04900 project. This includes developing machine formats for the different
05000 kinds of textual information, mathematical fomulas, pictures, and
05100 diagrams and also display formats. It also includes techniques for
05200 making sure the work is done correctly by proof reading or verifying
05300 or computer syntax checking.
05400
05500 4. Tata gets a PDP-10 computer with multi-console display
05600 equipment to do the research. The actual keyboarding is done either
05700 on keypunches (purchasable for rupees) or on time shared PDP-11's
05800 with keyboards.
05900
06000 5. The benefit to the U.S.is that we get the literature into
06100 computerized form cheaply.
06200
06300 6. The benefits to India are a) they pay off some of the
06400 debt, b) they get a substantial R & D contract in a leading area of
06500 computer science, c) they get a first class computer science research
06600 computer.
06700
06800 Some answers to questions.
06900
07000 -3
07100 1. How much literature can be punched? At $3.00 x 10 per
07200 7 10
07300 character, $10 gives 3 x 10 chars = 60,000 books.
07400
07500 2. What about copyright? Permission of copyright holders
07600 should be requested on the basis that no payment be made for putting
07700 the information into the system; payment is for reference to the
07800 information on the basis of usage at rates to be negotiated later;
07900 probably a flat rate per look at a page for old material and rates
08000 set by the copyright holder after a library system is operational for
08100 new material. If the copyright holder refuses, his material will not
08200 be entered. If he wants to put it into the library, later he will
08300 have to do so at his own expense. Few will refuse.
08500
08600 3. Can the project be scaled down. The cost of the R & D to
08700 develop internal formats and means of supervision places a limit on
08800 how much the project could be scaled down and still be meaningful.
08900
09000 4. Who has to agree? ARPA, most likely PSAC, the State
09100 Department, the Office of Budget and Management, maybe Congress, the
09200 Government of India and the Tata Institute.
09300
09400 5. What variations are possible? A different Indian
09500 organization, the scale of the project, a different country with low
09600 wages where the U.S. has blocked currency and where the necessary
09700 computer competence exists, the extent of participation of U.S.
09800 organizations in the R & D and supervision, possible use of a private
09900 Indian concern, doing the whole job by OCR (optical character
10000 recognition by machine).